AutoMarkup: A Tool for Automatically Marking up Text Documents
نویسندگان
چکیده
In this paper we present a novel system that can automatically mark up text documents into XML. The system uses the Self-Organizing Map (SOM) algorithm to organize marked documents on a map so that similar documents are placed on nearby locations. Then by using the inductive learning algorithm C5, it automatically generates and applies the markup rules from the nearest SOM neighbours of an unmarked document. The system is adaptive in nature and learns from errors in the automatically marked-up document to improve accuracy. The automatically marked-up documents are again arranged on the SOM.
منابع مشابه
Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...
متن کاملAutomating XML markup of text documents
We present a novel system for automatically marking up text documents into XML and discuss the benefits of XML markup for intelligent information retrieval. The system uses the Self-Organizing Map (SOM) algorithm to arrange XML marked-up documents on a twodimensional map so that similar documents appear closer to each other. It then employs an inductive learning algorithm C5 to automatically ex...
متن کاملKAT: an Annotation Tool for STEM Documents
Considering a constantly growing body of mathematical knowledge it becomes more and more difficult for individuals to take full advantage of all information available. To semantify documents, we need to be able to create annotations efficiently and conveniently – marking definitions or declarations as well as usages of concepts – on a large corpus of documents. Eventually this can be achieved a...
متن کاملRSTTool 2.4 - A markup Tool for Rhetorical Structure Theory
RSTTool is a graphical tool for annotating a text in terms of its rhetorical structure. The demonstration will show the various interfaces of the tool, focusing on its ease of use. 1 I n t r o d u c t i o n This paper describes the RSTTool, a graphical interface for marking up the structure of text. While primarily intended to be used for marking up Rhetorical Structure (cf. Rhetorical Structur...
متن کاملAutomating XML Markup using Machine Learning Techniques
In this paper we present a novel system for automatically marking up text documents into XML. The system uses the techniques of the Self-Organising Map (SOM) algorithm in conjunction with an inductive learning algorithm, C5.0. The SOM algorithm clusters the XML marked-up documents on a two-dimensional map such that documents having similar content are placed close to each other. The C5.0 algori...
متن کامل